DO NOT MERGE: ClusterExtensionRevision API Review#2610
DO NOT MERGE: ClusterExtensionRevision API Review#2610perdasilva wants to merge 1 commit intoopenshift:masterfrom
Conversation
Signed-off-by: Per Goncalves da Silva <pegoncal@redhat.com>
|
Pipeline controller notification For optional jobs, comment This repository is configured in: LGTM mode |
|
Important Review skippedAuto reviews are limited based on label configuration. 🚫 Excluded labels (none allowed) (1)
Please check the settings in the CodeRabbit UI or the You can disable this status message by setting the ✨ Finishing touches🧪 Generate unit tests (beta)
Comment |
|
Skipping CI for Draft Pull Request. |
|
Hello @perdasilva! Some important instructions when contributing to openshift/api: |
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
|
Following up here - this is in my queue, but I'm working through reading the associated RFE first to ensure I've got full context before reviewing. I should have an initial review within the next day or so. |
everettraven
left a comment
There was a problem hiding this comment.
First pass, a handful of comments.
It might be helpful to run the latest version of linter against this API to catch some of the smaller things our linter is good at catching :).
| // | ||
| // Once a revision is set to "Archived", it cannot be un-archived. | ||
| // | ||
| // +kubebuilder:default="Active" |
There was a problem hiding this comment.
Should this be a required input instead of defaulting to Active?
What does it mean for multiple ClusterExtensionRevision objects for the same ClusterExtension to be Active?
There was a problem hiding this comment.
Active is the current one, Archive are the old states.
CER are like snapshot of the CE in certain period of time
There was a problem hiding this comment.
So there can never be more than 1 CER marked as Active?
Also, if this field is required, it shouldn't be defaulted. Required with a default is contradictory.
There was a problem hiding this comment.
During an upgrade, while a new ClusterExtensionRevision is being rolled out and before the old revision has begun to be torn down, both CERs will be marked as Active.
I'm in agreement on the defaulting - plus when we create revisions we should be setting it explicitly anyway rather than relying on defaulting.
There was a problem hiding this comment.
Please make sure there is a clear description of what each state means and what it means if you have multiple instances of the Active state.
| // ClusterExtensionRevisionLifecycleStatePaused / "Paused" disables reconciliation of the ClusterExtensionRevision. | ||
| // Object changes will not be reconciled. However, status updates will be propagated. | ||
| ClusterExtensionRevisionLifecycleStatePaused ClusterExtensionRevisionLifecycleState = "Paused" |
There was a problem hiding this comment.
Not used in the API as an allowed value?
There was a problem hiding this comment.
'Pauses' will be removed for now. Even through the underlying library supports it, it's not needed rn.
| // [RFC 1123]: https://tools.ietf.org/html/rfc1123 | ||
| // | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| // +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` |
There was a problem hiding this comment.
We discourage the usage of the pattern marker because of the lack of helpful error messages that are returned to end-users (a regex string instead of a sentence explaining the constraints).
Instead, use a CEL expression that enforces this constraint and returns a helpful message like:
api/insights/v1alpha2/types_insights.go
Line 403 in c2a41ea
There was a problem hiding this comment.
To check, what is the minimum KAS version this API is intended to be installed upon. For 1.35 onwards we will prefer format: k8s-short-name. From 1.32 you can use the formats library. If you're supporting versions older than that you'll have to use the regex
There was a problem hiding this comment.
TIL about the new format :)
There was a problem hiding this comment.
TIL here as well - we should use format: k8s-short-name instead. I don't think this API will be active on anything less than 1.32, @perdasilva correct me if I'm wrong?
| // | ||
| // +kubebuilder:validation:MaxLength=63 | ||
| // +kubebuilder:validation:Pattern=`^[a-z]([-a-z0-9]*[a-z0-9])?$` | ||
| Name string `json:"name"` |
There was a problem hiding this comment.
Explicitly mark the field as required.
| Name string `json:"name"` | |
| // +required | |
| Name string `json:"name"` |
There was a problem hiding this comment.
Is the empty string a valid value?
| // +listType=map | ||
| // +listMapKey=name |
There was a problem hiding this comment.
Document the uniqueness constraint keyed on the phase name in the GoDoc.
There was a problem hiding this comment.
Could you please explain this one?
I did not get the request
There was a problem hiding this comment.
The +listType=map and +listMapKey=name markers enforce a uniqueness constraint on this field based on the name of the phase.
Markers are not included in generated documentation so we strongly recommend making all the input constraints explicitly in the GoDoc. I do not see in the GoDoc that would be used in generated documentation for the field mentioning that phase names must be unique, so I'm asking that it be added.
|
|
||
| // collisionProtection controls whether the operator can adopt and modify objects | ||
| // that already exist on the cluster. | ||
| // |
There was a problem hiding this comment.
Nit: We normally try to follow a pattern where we explicitly introduce the allowed values as well with a sentence like:
Allowed values are Prevent, IfNoController, ...
before going into the When ... sections.
| // | ||
| // +kubebuilder:default="Prevent" | ||
| // +kubebuilder:validation:Enum=Prevent;IfNoController;None | ||
| // +optional |
There was a problem hiding this comment.
Might be helpful context, could you elaborate for me why you want this field to be optional?
There was a problem hiding this comment.
Leaving this field as optional helps protect against accidental misuse; the field needs to be one of these three, and requiring every user to choose one of them opens more opportunities to cause problems from controller conflicts.
There was a problem hiding this comment.
Who is it that is creating this resource? My interpretation was that OLM itself would be creating these resources and not end-users - is that incorrect?
There was a problem hiding this comment.
Yes that's correct, and a human shouldn't need to interact with this API apart from some kind of manual intervention. IMO this also helps for the non-human users, in the sense that it may help prevent bugs that create controller vs controller conflicts.
There was a problem hiding this comment.
I'd recommend avoiding defaulting like this when it is a controller that is responsible for creating these resources - it obfuscates whether or not the controller intentionally set a value or if it omitted a value and it defaulted to that value.
If the field is effectively required, make it the responsibility of your controller to always set this value and have your default behavior explicitly encoded in the controller.
| // Use this setting with extreme caution as it may cause multiple controllers to fight over | ||
| // the same resource, resulting in increased load on the API server and etcd. | ||
| // | ||
| // +kubebuilder:default="Prevent" |
There was a problem hiding this comment.
Do you ever intend to modify the default behavior? Done this way, the defaulting logic becomes a part of your API contract.
There was a problem hiding this comment.
I personally can't imagine us changing the defaulting behavior away from Prevent, which is the safest option of the three.
| // | ||
| // +kubebuilder:validation:EmbeddedResource | ||
| // +kubebuilder:pruning:PreserveUnknownFields | ||
| Object unstructured.Unstructured `json:"object"` |
There was a problem hiding this comment.
I could be wrong here, but IIRC most embedded resource types will use https://pkg.go.dev/k8s.io/kubernetes/pkg/runtime#RawExtension as the type there.
I don't have a strong opinion here though as I've not reviewed many APIs that are embedding another resource blob within them. If this is working, 👍
| - jsonPath: .metadata.creationTimestamp | ||
| name: Age | ||
| type: date | ||
| name: v1 |
There was a problem hiding this comment.
v1? Have you shipped this API already? Typically, APIs for experimental type features will start as v1alpha1. In OpenShift, it is expected that TP APIs are v1alphaN until ready to be promoted.
|
@everettraven Thank you for the thoughtful review. Imma queue up an issue to address what you've identified! =D |
Just a note, it may be easiest to get everything to a good state in this temporary PR and then use the diff between what exists in this PR and the upstream APIs for making the changes. That way you don't have to do multiple rounds of iteration. |
| // - When status is True and reason is ProbesSucceeded, the ClusterExtensionRevision has been successfully rolled out and all objects pass their readiness probes. | ||
| // - When status is False and reason is ProbeFailure, one or more objects are failing their readiness probes during rollout. | ||
| // - When status is Unknown and reason is Reconciling, the ClusterExtensionRevision has encountered an error that prevented it from observing the probes. | ||
| // - When status is Unknown and reason is Archived, the ClusterExtensionRevision has been archived and its objects have been torn down. |
There was a problem hiding this comment.
This seems like Available = False might be more appropriate?
There was a problem hiding this comment.
I believe the convention we're trying to set here is that True vs False implies we're actively checking readiness probes to determine the state of installed content. Once a revision has been Archived, we are no longer checking those probes, thus the availability is Unknown. Technically, the resources could still be available if they haven't been completely garbage collected yet, but since this resource is Archived we're no longer paying attention to its resources.
There was a problem hiding this comment.
Gotcha - maybe we can do some updating of the text here to clarify that the unknown status is because it is archived and the system is no longer paying attention to the resources associated with this, as opposed to the current wording where it sounds like it is because the resources associated with it are removed?
OLMv1 ClusterExtensionRevision API Review
This PR is only to facilitate API review.
ClusterExtensionRevision API
ClusterExtensionRevision objects are created by the ClusterExtension controller for each install, upgrade, or reconfiguration operation. They are owned by a ClusterExtension. A ClusterExtensionRevision carries the objects of the revision (i.e. the resources of a package at a given version + any user supplied configuration). We don't expect users to create any clusterextensionrevision objects. We only expect users to interact with this API when something goes wrong, or, in the future, in approval flows (i.e. a revision is kept from rolling out until there's some manual intervention by an user).
Lifecycle
Revision Rollout Strategy
Available=True.status.replicas==.status.updatedReplicasObject ownership is transferred from previous to subsequent revisions.
Once a revision completes, it sets all previous revisions to
Archived. WhenArchivedthe revision will clean up any objects it still manages.Collision Control
Phase objects can be be configured to error on existing, adopt orphan resources, or force adopt resources.
Bundle Rollout Strategy Definition
registry+v1 bundles will have their rollout strategy defined by OLM in both phases and probes. If we directly support the Helm format, OLM will likey also define the rollout strategy.
Currently the registry+v1 bundle strategy defined by sorting the bundle manifests into different phases: namespaces, policies, rbac, crds, etc. with default probes, e.g. CRD has
Established=Truecondition, Deployment hasAvailable=True, etc.Known Upcoming Changes
collisionProtectionout of the object and up to the top levelKnown Limitations
The objects are currently defined inline in the CRD. In the (not so distant) future we will shard the resources across some kind of container (e.g. ConfigMap or Secret).
Future Work (beyond GA)
Open Questions